Conversation
|
One test is failing CI (test_duckdb_url_import). Seems to be a download error and not due to changes in this PR. |
aa27a48 to
173ddb9
Compare
|
Hmm, it's my first time to see this error. I've merged a PR that bumps BTW, thanks for working on this! |
81a85ab to
43a2f35
Compare
384526c to
8c2e79a
Compare
|
@chinmay-bhat @Fokko @HonahX Sorry for the last-minute feedback, but wouldn't it be better if we explicitly pass |
| ) | ||
|
|
||
|
|
||
| @lru_cache |
There was a problem hiding this comment.
what are your thoughts on this, wouldn't be clearer if we could explicitly pass the max size for the lru_cache annotation, to store max entries rather than going default.
@lru_cache(maxsize=128)
There was a problem hiding this comment.
As the default is 128, I don't think we need to explicitly define maxsize=128.
Also, _manifests() is not a public API, so we would probably not set the cache size through user config via pyiceberg.yaml. Is there a use-case where the user would want to set the cache size?
There was a problem hiding this comment.
Agree the default size is what you mentioned, It was just an example, but the idea was to make it more readable & configurable.
Please feel free to take your call.
There was a problem hiding this comment.
I think 128 should be enough. We're caching manifest_list here and the number of manifest lists should be <100 in most cases.
We can always make this configurable in the future :)
|
I'm not sure why Is this a duck db problem? Or do I need to open a new PR (from main branch + my changes) to resolve it? |
|
Being able to configure (and also disable) the cachine would be a very nice
touch
Op ma 10 jun 2024 om 09:52 schreef Chinmay Bhat ***@***.***>
… ***@***.**** commented on this pull request.
------------------------------
In pyiceberg/table/snapshots.py
<#787 (comment)>
:
> @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool:
)
***@***.***_cache
As the default is 128
<https://docs.python.org/3/library/functools.html#functools.lru_cache>, I
don't think we need to explicitly define maxsize=128.
Also, _manifests() is not a public API, so we would *probably* not set
the cache size through user config via pyiceberg.yaml. Is there a use-case
where the user would want to set the cache size?
—
Reply to this email directly, view it on GitHub
<#787 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIU5KBBXVDMDRY62QQEY2LZGVLMNAVCNFSM6AAAAABIUZG46GVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMBWHEZTSOJSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@chinmay-bhat Looks like the CI is still a bit sad :( |
Hi @Fokko, I'm curious to know why we would want to support custom sized manifest caches. Here's my initial implementation for configuring and disabling cache: <top of file>
MANIFEST_CACHE_SIZE = "manifest-cache-size"
config = Config()
manifest_cache_size = config.get_int(MANIFEST_CACHE_SIZE) if config.get_int(MANIFEST_CACHE_SIZE) else 128
....
@lru_cache(maxsize=manifest_cache_size)
def _manifests(io: FileIO, manifest_list: str) -> List[ManifestFile]:
"""Return the manifests from the manifest list."""
file = io.new_input(manifest_list)
return list(read_manifest_list(file))
class Snapshot:
....
def manifests(self, io: FileIO, use_cache: bool = True) -> List[ManifestFile]:
"""Return the manifests for the given snapshot."""
if self.manifest_list:
if use_cache:
return _manifests(io, self.manifest_list)
else:
_manifests.cache_clear()
return _manifests.__wrapped__(io, self.manifest_list)
return []Is this in the right direction?
@Fokko @HonahX I need help resolving this error. Error seems unrelated to the changes in this PR. I've added more info in this comment. |
I tried this PR locally and did not observe this issue. My testing platforms are
It seems this issue can only be reproduced in github action😱:
|
|
I created a new PR against my fork, and once the GitHub actions failed, I manually re-tried them. With
Also, with |
|
FYI this is caused by transient networking failures when downloading the duckdb iceberg extension. Using the pre-release duckdb library ( |
|
#1149 bumps duckdb to v1.1.0 which has the PR to retry extension installation |
|
@chinmay-bhat do you mind rebasing off |
8c2e79a to
a54c570
Compare
|
rebased off main! |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! CI finally passed!
Thanks for the contribution
This reverts commit 1971fcf.
* cache manifests * update API * small fix * move cache to module level * update signature and check
* cache manifests * update API * small fix * move cache to module level * update signature and check
Closes #595.